D4C, a band-aperiodicity estimator for high-quality speech synthesis

نویسنده

  • Masanori Morise
چکیده

An algorithm is proposed for estimating the band aperiodicity of speech signals, where “aperiodicity” is defined as the power ratio between the speech signal and the aperiodic component of the signal. Since this power ratio depends on the frequency band, the aperiodicity should be given for several frequency bands. The proposed D4C (Definitive Decomposition Derived Dirt-Cheap) estimator is based on an extension of a temporally static group delay representation of periodic signals. In this paper, the principle and algorithm of D4C are explained, and its effectiveness is discussed with reference to objective and subjective evaluations. Evaluation results indicate that a speech synthesis system using D4C can synthesize natural speech better than ones using other algorithms. © 2016 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ ).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using instantaneous frequency and aperiodicity detection to estimate F0 for high-quality speech synthesis

This paper introduces a general and flexible framework for F0 and aperiodicity (additive non periodic component) analysis, specifically intended for high-quality speech synthesis and modification applications. The proposed framework consists of three subsystems: instantaneous frequency estimator and initial aperiodicity detector, F0 trajectory tracker, and F0 refinement and aperiodicity extract...

متن کامل

Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT

A new control paradigm of source signals for high quality speech synthesis is introduced to handle a variety of speech quality, based on timefrequency analyses by the use of an instantaneous frequency and group delay. The proposed signal representation consists of a frequency domain aperiodicity measure and a time domain energy concentration measure to represent source attributes, which supplem...

متن کامل

Parameterization of vocal fry in HMM-based speech synthesis

HMM-based speech synthesis offers a way to generate speech with different voice qualities. However, sometimes databases contain certain inherent voice qualities that need to be parametrized properly. One example of this is vocal fry typically occurring at the end of utterances. A popular mixed excitation vocoder for HMM-based speech synthesis is STRAIGHT. The standard STRAIGHT is optimized for ...

متن کامل

Towards minimum perceptual error training for DNN-based speech synthesis

We propose to use a perceptually-oriented domain to improve the quality of text-to-speech generated by deep neural networks (DNNs). We train a DNN that predicts the parameters required for speech reconstruction but whose cost function is calculated in another domain. In this paper, to represent this perceptual domain we extract an approximated version of the SpectroTemporal Excitation Pattern t...

متن کامل

Aperiodicity Analysis for Quality Estimation of Text-to-Speech Signals

This contribution presents a new approach towards nonintrusive quality assessment of text-to-speech (TTS) signals. Perturbation measures which capture the degree of excitationspecific aperiodicity in voiced speech are investigated concerning their quality implications in synthesized speech. Based on two independent TTS databases for which formal attributebased listening tests have been conducte...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Speech Communication

دوره 84  شماره 

صفحات  -

تاریخ انتشار 2016